big data era
A survey on data‐efficient algorithms in big data era
The leading approaches in Machine Learning are notoriously data-hungry. Unfortunately, many application domains do not have access to big data because acquiring data involves a process that is expensive or time-consuming. This has triggered a serious debate in both the industrial and academic communities calling for more data-efficient models that harness the power of artificial learners while achieving good results with less training data and in particular less human supervision. In light of this debate, this work investigates the issue of algorithms’ data hungriness. First, it surveys the issue from different perspectives. Then, it presents a comprehensive review of existing data-efficient methods and systematizes them into four categories. Specifically, the survey covers solution strategies that handle data-efficiency by (i) using non-supervised algorithms that are, by nature, more data-efficient, by (ii) creating artificially more data, by (iii) transferring knowledge from rich-data domains into poor-data domains, or by (iv) altering data-hungry algorithms to reduce their dependency upon the amount of samples, in a way they can perform well in small samples regime. Each strategy is extensively reviewed and discussed. In addition, the emphasis is put on how the four strategies interplay with each other in order to motivate exploration of more robust and data-efficient algorithms. Finally, the survey delineates the limitations, discusses research challenges, and suggests future opportunities to advance the research on data-efficiency in machine learning.
Event Prediction in the Big Data Era: A Systematic Survey
Events are occurrences in specific locations, time, and semantics that nontrivially impact either our society or the nature, such as civil unrest, system failures, and epidemics. It is highly desirable to be able to anticipate the occurrence of such events in advance in order to reduce the potential social upheaval and damage caused. Event prediction, which has traditionally been prohibitively challenging, is now becoming a viable option in the big data era and is thus experiencing rapid growth. There is a large amount of existing work that focuses on addressing the challenges involved, including heterogeneous multi-faceted outputs, complex dependencies, and streaming data feeds. Most existing event prediction methods were initially designed to deal with specific application domains, though the techniques and evaluation procedures utilized are usually generalizable across different domains. However, it is imperative yet difficult to cross-reference the techniques across different domains, given the absence of a comprehensive literature survey for event prediction. This paper aims to provide a systematic and comprehensive survey of the technologies, applications, and evaluations of event prediction in the big data era. First, systematic categorization and summary of existing techniques are presented, which facilitate domain experts' searches for suitable techniques and help model developers consolidate their research at the frontiers. Then, comprehensive categorization and summary of major application domains are provided. Evaluation metrics and procedures are summarized and standardized to unify the understanding of model performance among stakeholders, model developers, and domain experts in various application domains. Finally, open problems and future directions for this promising and important domain are elucidated and discussed.
Event Prediction in Big Data Era: A Systematic Survey
This survey has presented a comprehensive survey of existing methodologies developed for event prediction methods in the big data era. It provides an extensive overview of the event prediction challenges, techniques, applications, evaluation procedures, and future outlook, summarizing the research presented in over 200 publications, most of which were published in the last five years. Event prediction challenges, opportunities, and formulations have been discussed in terms of the event element to be predicted, including the event location, time, and semantics, after which we went on to propose a systematic taxonomy of the existing event prediction techniques according to the formulated problems and types of methodologies designed for the corresponding problems. We have also analyzed the relationships, differences, advantages, and disadvantages of these techniques from various domains, including machine learning, data mining, pattern recognition, natural language processing, information retrieval, statistics, and other computational models. In addition, a comprehensive and hierarchical categorization of popular event prediction applications has been provided that covers domains ranging from natural science to the social sciences. Based upon the numerous historical and state-of-the-art works discussed in this survey, the paper concludes by discussing open problems and future trends in this fast-growing domain.
Embracing the Era of Deep, Small Data
For years, the business world has been enraptured by the concept of big data. But the era of big data will not last forever. In fact, the replacement knocking on the door is one that might sound counter-intuitive: small data. Conventional wisdom suggests that data aggregation will only increase in size and scale. With an ever-expanding consumer base with evolving tastes, and an explosion of connected devices and digital channels to create and extract data, how could it not? But as we reach the point where most forward-looking businesses have "digitally transformed" and successfully used the vast amount of data to their advantage, the foundation is shaking.
Advances in Bayesian methods for big data
In the Big Data era, many scientific and engineering domains are producing massive data streams, with petabyte and exabyte scales becoming increasingly common. Besides the explosive growth in volume, Big Data also has high velocity, high variety, and high uncertainty. These complex data streams require ever-increasing processing speeds, economical storage, and timely response for decision making in highly uncertain environments, and have raised various challenges to conventional data analysis. With the primary goal of building intelligent systems that automatically improve from experiences, machine learning (ML) is becoming an increasingly important field to tackle big data challenges, with an emerging field of "Big Learning," which covers theories, algorithms and systems on addressing big data problems. Bayesian methods have been widely used in machine learning and many other areas.
Huawei's Noah's Ark Lab: Preparing for the Big Data Era
Black holes are an ongoing area for research and discovery. However, many mysteries of the universe can be solved with Big Data analytics. Artificial intelligence (AI) is the killer application of Big Data analytics. Logically based on machine learning and Big Data analytics, the more data available, the more intelligent it will get, resulting in more widespread applications. In future, you may own a smart robot or even a smart dog.
Machine learning, AI, and the next wave of legacy systems
Machine learning and AI are the evolution of what was considered revolutionary just a few years ago: gathering and making sense of large, previously silo-bound data volumes. During the first wave of the Big Data era, companies were just getting a handle on the distribution, variety, and monetary potential of their many silos. The focus at the time was on integration and recognition--the simple understanding of what data resided where, and what parts of the organization were accountable for it. This was no small challenge, and it's still happening in companies with large, distributed infrastructures and global teams. That first wave led to a new host of tools, including Hadoop, predictive analytics, and all the many innovations up and down the stack designed to integrate, centralize, process, store, and analyze Big Data.